17 research outputs found
Expert Knowledge-Guided Length-Variant Hierarchical Label Generation for Proposal Classification
To advance the development of science and technology, research proposals are
submitted to open-court competitive programs developed by government agencies
(e.g., NSF). Proposal classification is one of the most important tasks to
achieve effective and fair review assignments. Proposal classification aims to
classify a proposal into a length-variant sequence of labels. In this paper, we
formulate the proposal classification problem into a hierarchical multi-label
classification task. Although there are certain prior studies, proposal
classification exhibit unique features: 1) the classification result of a
proposal is in a hierarchical discipline structure with different levels of
granularity; 2) proposals contain multiple types of documents; 3) domain
experts can empirically provide partial labels that can be leveraged to improve
task performances. In this paper, we focus on developing a new deep proposal
classification framework to jointly model the three features. In particular, to
sequentially generate labels, we leverage previously-generated labels to
predict the label of next level; to integrate partial labels from experts, we
use the embedding of these empirical partial labels to initialize the state of
neural networks. Our model can automatically identify the best length of label
sequence to stop next label prediction. Finally, we present extensive results
to demonstrate that our method can jointly model partial labels, textual
information, and semantic dependencies in label sequences, and, thus, achieve
advanced performances.Comment: 10 pages, Accepted as regular paper by ICDM 202
Semi-supervised Domain Adaptation in Graph Transfer Learning
As a specific case of graph transfer learning, unsupervised domain adaptation
on graphs aims for knowledge transfer from label-rich source graphs to
unlabeled target graphs. However, graphs with topology and attributes usually
have considerable cross-domain disparity and there are numerous real-world
scenarios where merely a subset of nodes are labeled in the source graph. This
imposes critical challenges on graph transfer learning due to serious domain
shifts and label scarcity. To address these challenges, we propose a method
named Semi-supervised Graph Domain Adaptation (SGDA). To deal with the domain
shift, we add adaptive shift parameters to each of the source nodes, which are
trained in an adversarial manner to align the cross-domain distributions of
node embedding, thus the node classifier trained on labeled source nodes can be
transferred to the target nodes. Moreover, to address the label scarcity, we
propose pseudo-labeling on unlabeled nodes, which improves classification on
the target graph via measuring the posterior influence of nodes based on their
relative position to the class centroids. Finally, extensive experiments on a
range of publicly accessible datasets validate the effectiveness of our
proposed SGDA in different experimental settings
Kernel-based Substructure Exploration for Next POI Recommendation
Point-of-Interest (POI) recommendation, which benefits from the proliferation
of GPS-enabled devices and location-based social networks (LBSNs), plays an
increasingly important role in recommender systems. It aims to provide users
with the convenience to discover their interested places to visit based on
previous visits and current status. Most existing methods usually merely
leverage recurrent neural networks (RNNs) to explore sequential influences for
recommendation. Despite the effectiveness, these methods not only neglect
topological geographical influences among POIs, but also fail to model
high-order sequential substructures. To tackle the above issues, we propose a
Kernel-Based Graph Neural Network (KBGNN) for next POI recommendation, which
combines the characteristics of both geographical and sequential influences in
a collaborative way. KBGNN consists of a geographical module and a sequential
module. On the one hand, we construct a geographical graph and leverage a
message passing neural network to capture the topological geographical
influences. On the other hand, we explore high-order sequential substructures
in the user-aware sequential graph using a graph kernel neural network to
capture user preferences. Finally, a consistency learning framework is
introduced to jointly incorporate geographical and sequential information
extracted from two separate graphs. In this way, the two modules effectively
exchange knowledge to mutually enhance each other. Extensive experiments
conducted on two real-world LBSN datasets demonstrate the superior performance
of our proposed method over the state-of-the-arts. Our codes are available at
https://github.com/Fang6ang/KBGNN.Comment: Accepted by the IEEE International Conference on Data Mining (ICDM)
202
Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation
The objective of topic inference in research proposals aims to obtain the
most suitable disciplinary division from the discipline system defined by a
funding agency. The agency will subsequently find appropriate peer review
experts from their database based on this division. Automated topic inference
can reduce human errors caused by manual topic filling, bridge the knowledge
gap between funding agencies and project applicants, and improve system
efficiency. Existing methods focus on modeling this as a hierarchical
multi-label classification problem, using generative models to iteratively
infer the most appropriate topic information. However, these methods overlook
the gap in scale between interdisciplinary research proposals and
non-interdisciplinary ones, leading to an unjust phenomenon where the automated
inference system categorizes interdisciplinary proposals as
non-interdisciplinary, causing unfairness during the expert assignment. How can
we address this data imbalance issue under a complex discipline system and
hence resolve this unfairness? In this paper, we implement a topic label
inference system based on a Transformer encoder-decoder architecture.
Furthermore, we utilize interpolation techniques to create a series of
pseudo-interdisciplinary proposals from non-interdisciplinary ones during
training based on non-parametric indicators such as cross-topic probabilities
and topic occurrence probabilities. This approach aims to reduce the bias of
the system during model training. Finally, we conduct extensive experiments on
a real-world dataset to verify the effectiveness of the proposed method. The
experimental results demonstrate that our training strategy can significantly
mitigate the unfairness generated in the topic inference task.Comment: 19 pages, Under review. arXiv admin note: text overlap with
arXiv:2209.1391
Graph Soft-Contrastive Learning via Neighborhood Ranking
Graph Contrastive Learning (GCL) has emerged as a promising approach in the
realm of graph self-supervised learning. Prevailing GCL methods mainly derive
from the principles of contrastive learning in the field of computer vision:
modeling invariance by specifying absolutely similar pairs. However, when
applied to graph data, this paradigm encounters two significant limitations:
(1) the validity of the generated views cannot be guaranteed: graph
perturbation may produce invalid views against semantics and intrinsic topology
of graph data; (2) specifying absolutely similar pairs in the graph views is
unreliable: for abstract and non-Euclidean graph data, it is difficult for
humans to decide the absolute similarity and dissimilarity intuitively. Despite
the notable performance of current GCL methods, these challenges necessitate a
reevaluation: Could GCL be more effectively tailored to the intrinsic
properties of graphs, rather than merely adopting principles from computer
vision? In response to this query, we propose a novel paradigm, Graph
Soft-Contrastive Learning (GSCL). This approach facilitates GCL via
neighborhood ranking, avoiding the need to specify absolutely similar pairs.
GSCL leverages the underlying graph characteristic of diminishing label
consistency, asserting that nodes that are closer in the graph are overall more
similar than far-distant nodes. Within the GSCL framework, we introduce
pairwise and listwise gated ranking InfoNCE loss functions to effectively
preserve the relative similarity ranking within neighborhoods. Moreover, as the
neighborhood size exponentially expands with more hops considered, we propose
neighborhood sampling strategies to improve learning efficiency. Our extensive
empirical results across 11 commonly used graph datasets-including 8 homophily
graphs and 3 heterophily graphs-demonstrate GSCL's superior performance
compared to 20 SOTA GCL methods
T-cell infiltration in the central nervous system and their association with brain calcification in Slc20a2-deficient mice
Primary familial brain calcification (PFBC) is a rare neurodegenerative and neuropsychiatric disorder characterized by bilateral symmetric intracranial calcification along the microvessels or inside neuronal cells in the basal ganglia, thalamus, and cerebellum. Slc20a2 homozygous (HO) knockout mice are the most commonly used model to simulate the brain calcification phenotype observed in human patients. However, the cellular and molecular mechanisms related to brain calcification, particularly at the early stage much prior to the emergence of brain calcification, remain largely unknown. In this study, we quantified the central nervous system (CNS)-infiltrating T-cells of different age groups of Slc20a2-HO and matched wild type mice and found CD45+CD3+ T-cells to be significantly increased in the brain parenchyma, even in the pre-calcification stage of 1-month-old -HO mice. The accumulation of the CD3+ T-cells appeared to be associated with the severity of brain calcification. Further immunophenotyping revealed that the two main subtypes that had increased in the brain were CD3+ CD4− CD8– and CD3+ CD4+ T-cells. The expression of endothelial cell (EC) adhesion molecules increased, while that of tight and adherents junction proteins decreased, providing the molecular precondition for T-cell recruitment to ECs and paracellular migration into the brain. The fusion of lymphocytes and EC membranes and transcellular migration of CD3-related gold particles were captured, suggesting enhancement of transcytosis in the brain ECs. Exogenous fluorescent tracers and endogenous IgG and albumin leakage also revealed an impairment of transcellular pathway in the ECs. FTY720 significantly alleviated brain calcification, probably by reducing T-cell infiltration, modulating neuroinflammation and ossification process, and enhancing the autophagy and phagocytosis of CNS-resident immune cells. This study clearly demonstrated CNS-infiltrating T-cells to be associated with the progression of brain calcification. Impairment of blood–brain barrier (BBB) permeability, which was closely related to T-cell invasion into the CNS, could be explained by the BBB alterations of an increase in the paracellular and transcellular pathways of brain ECs. FTY720 was found to be a potential drug to protect patients from PFBC-related lesions in the future
A Comprehensive Survey on Deep Graph Representation Learning
Graph representation learning aims to effectively encode high-dimensional
sparse graph-structured data into low-dimensional dense vectors, which is a
fundamental task that has been widely studied in a range of fields, including
machine learning and data mining. Classic graph embedding methods follow the
basic idea that the embedding vectors of interconnected nodes in the graph can
still maintain a relatively close distance, thereby preserving the structural
information between the nodes in the graph. However, this is sub-optimal due
to: (i) traditional methods have limited model capacity which limits the
learning performance; (ii) existing techniques typically rely on unsupervised
learning strategies and fail to couple with the latest learning paradigms;
(iii) representation learning and downstream tasks are dependent on each other
which should be jointly enhanced. With the remarkable success of deep learning,
deep graph representation learning has shown great potential and advantages
over shallow (traditional) methods, there exist a large number of deep graph
representation learning techniques have been proposed in the past decade,
especially graph neural networks. In this survey, we conduct a comprehensive
survey on current deep graph representation learning algorithms by proposing a
new taxonomy of existing state-of-the-art literature. Specifically, we
systematically summarize the essential components of graph representation
learning and categorize existing approaches by the ways of graph neural network
architectures and the most recent advanced learning paradigms. Moreover, this
survey also provides the practical and promising applications of deep graph
representation learning. Last but not least, we state new perspectives and
suggest challenging directions which deserve further investigations in the
future